Evaluating force field accuracy with long-time simulations of a 
tryptophan zipper peptide 

N. R. Hayre0 R- R. P. Singh, and D. L. Cox 

Department of Physics, University of California, Davis, California, 95616, USA 
(Dated: June 27, 2010) 

We have combined a custom implementation of the fast multiple-time-stepping LN integrator with parallel 
tempering to explore folding properties of small peptides in implicit solvent on the time scale of microseconds. 
We applied this algorithm to the synthetic /3-hairpin trpzip2 and one of its sequence variants W2W9. Each 
simulation consisted of over 12 /us of aggregated virtual time. Several measures of folding behavior showed 
convergence, allowing comparison with experimental equilibrium properties. Our simulations suggest that the 
electrostatic interaction of tryptophan sidechains is responsible for much of the stability of the native fold. 
We conclude that the ff99 force field combined with ff96 <fi and i/j dihedral energies and implicit solvent can 
reproduce plausible folding behavior in both trpzip2 and W2W9. 



I. INTRODUCTION 

The study of small peptides with well-defined 
secondary-structural motifs gives insight into the basic 
properties of protein systems. The /3-hairpin is one such 
motif that is of particular interest, consisting of two short 
anti-parallel /3-strands connected by a turn. In and of 
themselves, isolated /3-hairpin sequences possess folding 
behavior similar to that of larger proteins, based on sta- 
tistical mechanical considerations 1 ' 2 . As a part of larger 
proteins, the /3-hairpin motif may be important as a 
nucleation site along kinetic folding pathways-. More- 
over, the relatively short timescale of folding of some 
/3-hairpins 4 coupled with their small size makes them 
amenable to all-atom simulation, lending insight into the 
physical system while providing a test of current mod- 
els. Here we focus on a particular hairpin that has been 
subject to several experimental and theoretical studies. 

The tryptophan zippers are a set of six 12- and 16- 
rcsidue engineered /3-hairpin peptides (trpzip 1 to 6), 
each possessing one or more cross-strand tryptophan 
pairs^. The interaction of the indole side chains of tryp- 
tophan (Trp) imparts a stability that is unusual for such 
small proteins. At least in the case of trpzip 1, Trp pair- 
ings appear to be not only sufficient for stability, but 
necessary; when they are removed (via mutations of Trp 
to other hydrophobic residues), there is no evidence for 
hairpin formation 5 . In analogy with mutations of tr- 
pzipl, a recent experiment found that a mutational vari- 
ant of trpzip2 without potential cross-strand Trp pair- 
ings, dubbed W2W9 in reference to the residue indices 
where Trp residues were not mutated to valine, has no 
detectable folding, whereas mutants with intact cross- 
strand pairs are stable hairpins 6 . 

Their unusual stability, small size, and the availability 
of well-constrained native structural models has made 
the trpzip peptides the subject of numerous experimen- 
tal and theoretical-computational studies. Among the 
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latter, several have demonstrated stability and folding of 
trpzip models using OPLS-AAii, AMBER S96^., and 
AMBER ff99SB-i 3 - all-atom molecular mechanics force 
fields. Despite its success in this context, ff96 may ar- 
tificially stabilize extended /3-strand conformations^—, 
and specifically /3-hairpinsi^. 

Herein, we demonstrate that the ff96 parameterization 
of backbone dihedral energies (denoted here dih96) can 
reproduce realistic equilibrium folding properties in tr- 
pzip2 and its W2W9 variant in silico, in agreement with 
experiment. Simulation of the W2W9 variant embodies 
a novel test of force field accuracy, and gives further in- 
sight into /3-hairpin stability. Specifically, we find that 
the electrostatic interaction of cross-strand Trp pairs is 
critical for folding stability. 

To carry out this study, we use a new implementation 
of the fast, accurate multiple-time-stepping (MTS) inte- 
gration algorithm LN (Langevin/Normal-mode)^, cou- 
pled with replica exchange molecular dynamics (REMD) 
for enhanced sampling. REMD gives a picture of the 
thermal equilibrium properties of the system, while the 
MTS integrator facilitates access to longer simulated 
times and thus more nearly-equilibrated structural en- 
sembles. Simulations were long enough to observe con- 
vergence of order parameters from two simulations be- 
gun with very different initial conditions. REMD has the 
disadvantage of obscuring realistic constant-temperature 
folding dynamics and kinetic rates, so we have limited 
ourselves to equilibrium properties. 

II. METHODS 

A. Energetics 

The 12-residue synthetic peptide trpzip2 (sequence S- 
WTWENGKWTWK-NH2) and a mutated form W2W9 
(sequence SWTVENGKWTVK-NH2) were modeled 
with classical molecular dynamics (MD). Solute atoms 
interact by a potential energy whose functional form was 
described by Cornell et al.—. The parameters in this 
potential energy were drawn from the Assisted Model 
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Building with Energy Refinement (AMBER) ff99 set 20 . 
Modifications to these parameters were considered for the 
potential energies of backbone (j> and ip dihedral angle ro- 
tations. The first modification, denoted ff99SB, is a pro- 
posed correction of the backbone dihedral energies based 
on quantum- mechanical calculations 21 , establishing good 
relative energies between different secondary structures. 
The second, denoted in this work as ff99/dih96, consists 
of corrections made by Kollman et at— (ff96) applied to 
the ff99 force field. Specifically, no backbone dihedral en- 
ergy terms are dropped from ff99; they are only modified 
where they differ from those in ff96. 

The energy of electrostatic solvent interaction was ap- 
proximated with the generalized Born (GB) model of 
Hawkins et at 23 . A 0.10 M concentration of mobile 
counterions was assumed for calculation of Debye-Hiickel 
screening. The solute dielectric constant was set to unity, 
and that of the solvent was set to 78.5. 

The solvent-accessible surface area (SASA) model is 
intended to account for the non-polar aspect of the 
conformationally-dependent free energy change of aque- 
ous solvation 2 ^, and it is often used in conjunction with 
GB models under the name GBSA. The SASA model was 
implemented here using the LCPO algorithm 25 , with a 
surface tension of 0.005 kcal/A 2 . 

B. Dynamics - Multiple Time Stepping 

The positions and velocities of atomic point masses 
were evolved with a Langevin equation of mo- 
tion. A multiple-time-stepping scheme, the LN 
(Langevin/normal-mode) algorithm^, was used to ac- 
curately integrate the Langevin equation while improv- 
ing computational efficiency. LN improves upon the 
Langevin/implicit-Euler/normal-mode (LIN) scheme 2 ^ 
by removing its computationally-expensive implicit in- 
tegration. LN achieves significant performance benefits 
over other MTS schemes by favoring force extrapolation 
to impulse, though this choice results in an integrator 
that is neither symplectic nor time-reversible, and which 
requires moderate damping to remain stable. 

The present implementation of LN closely follows the 
introductory worki& in terms of the composition of force- 
splitting groups and distance cutoffs. Following their pro- 
posal, bonded forces were directly calculated instead of 
linearized, for simplicity. What follows is a brief descrip- 
tion of our implementation of LN. 

Terms in the total potential energy function U arc 
grouped according to the inherent length and time scales 
on which significant variations occur: 

u = u f + U m + U S + U c , (1) 

where the subscripts refer to "fast", "medium", "slow", 
and "cutoff" groups. The fast group contains all bonded 
(pairwise, angle, and dihedral) terms. The medium group 
contains all non-bonded (electrostatic, solvation, and van 
der Waals) terms wherein pairwise distance d is less than 



an inner cutoff d m = 5 A. The slow group contains non- 
bonded terms for which d ln < d < d out = 15 A, where 
dout is an outer cutoff. All energy terms for which d > 
d ou t are placed in the cutoff group, and are not evaluated 
or used to calculate forces. While bonded terms are kept 
permanently in the fast group, non-bonded terms were 
periodically reassigned among the other groups in order 
to satisfy the above pairwise distance conditions. 

The total force F on an atom was considered as the 
sum of contributions from each of the potential energy 
groups, excepting U c : 

F = F f (r) + F m (r m ) + F s (r), (2) 

where Fi = — Vf/j. The fast forces Ff are evaluated at ev- 
ery timestep using instantaneous positions. The medium 
forces F m are updated after every interval At m = k m Ar 
using the positions r m = r+^At m v (midpoint extrapola- 
tion) . The slow forces F s are updated after every interval 
At = k s At m using instantaneous positions (constant ex- 
trapolation) . Assuming an arbitrary integration timestep 
labeled n, atomic positions r and velocities v are updated 
according to: 

At 

r n+l/2 — r n H 2~ Vn 

F m = F m (r n + -At m v) if n mod k m = 

F s = F s (r n+1 / 2 ) if n mod k m k s = 

v n + At{F } + F m + F s +R n )/m 

Vn + 1 = t~i — a 

1 + 7 At 

At 

r n +x = r n+ i/ 2 + -^ v n 
(r n+ i,v n+ i) <- RATTLE(r n+ i,u„ + i). 

(3) 

R n is a zero-mean normally-distributed random process 
with variance 2kBTmj/ At, where 7 = 20 ps _1 is a colli- 
sion frequency that determines the strength of tempera- 
ture coupling. Pairwise terms were reassigned to appro- 
priate groups every k m ■ k s timesteps; and Born radii and 
solvent pairlists were recalculated after each evaluation 
of the medium group forces. 

Positions and velocities of hydrogen atoms and their 
heavy-atom bonding partners were constrained using the 
RATTLE algorithm^, which allowed an augmented base 
timestep of At = 2 fs. 

The above algorithm was implemented in-house en- 
tirely in the C++ language and built with the GNU 
Compiler Collection, version 4.3.1. Source code is avail- 
able upon request. Both the validation of this MD algo- 
rithm (as described in the Appendix) and computational 
speed measurements were performed using the AMBER 
9 package 2 ^ as a standard. 
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FIG. 1. The upper three graphs show the small deviations 
of LN energies from AMBER energies by type, expressed as 
percent errors from the AMBER values, (a) Bonded energy is 
the sum of pairwise, angle, and dihedral terms; (b) nonbonded 
energy (denoted NB) is the sum of vacuum van der Waals 
and electrostatic terms; and (c) GB represents the energy of 
generalized Born solvation. The lower graph (d) shows a rapid 
increase of computational rate with the outer timestep of LN. 



C. Performance 

The LN algorithm was compared to AMBER in terms 
of computational speed using a single core of an AMD 
Opteron 2216 system running at 2.4 GHz. As shown 
in Figure [T] AMBER is faster than LN by about a fac- 
tor of two at the lowest outer timestep (corresponding 
to no constant extrapolation and equal timesteps), pre- 
sumably due to more optimized computer code. With 
increasing outer timestep, however, LN rapidly surpasses 
AMBER and approaches an asymptotic maximum above 
60 ns/day. A medium timestep At m = 4 f s (k m = 2) was 
chosen to balance stability and numerical accuracy with 
efficiency An outer timestep of At = 128 fs, correspond- 
ing to k s — 32, was settled upon to balance efficiency 
gain and conservative extrapolation. 



D. Replica Exchange 

A replica exchange scheme was used to improve 
sampling and observe temperature-dependent effects^. 
Replicas were simulated at 20 target temperatures I* 
with i = 1,2,..., 20. Ti and T 20 were set to 200 K 
and 500 K, respectively, and the intermediate values 
were separated with exponentially increasing intervals. 



During simulation, pairs of replicas at adjacent temper- 
atures Ti and Tj+i were exchanged with a probability 
p = mm(l,exp[((3i-p i+1 )(Ui-U i+1 )]). The values ft and 
Ui are the inverse temperature and total potential energy, 
respectively, of the replica at Ti. A round consisted of 
attempting an exchange every 5 ps between temperature- 
adjacent replica pairs in order of increasing tempera- 
ture, except for the last attempt, which was between the 
highest- and lowest-temperature replicas. Rounds of ex- 
change attempts were carried out continuously. 

The convergence of order parameters computed from 
two independent REMD simulations in which the ini- 
tial replica conformations are significantly different (as 
measured by the same order parameter) can be used to 
indicate the degree to which equilibrium is sampled 30 . 
REMD runs were thus performed in pairs, referred to 
as "unfolding" and "folding" runs. The unfolding run 
started with all replicas in the native a-carbon (C a ) con- 
formation of trpzip2 (PDB ID: 1LE1, model The 
folding set started in a fully extended conformation in 
which all the peptide backbone dihedral angles are set at 
7r radians. Initial steric clashes in both ensembles were 
reduced with a steepest-descent minimization of the total 
potential energy. 



E. Analysis 

Configurations and energies were recorded for each 
replica at 10 ps intervals, and these data were used to 
calculate several quantities. The C Q radius of gyration 
R g is a simple measure of the spatial compactness of a 



conformation. Given N C a positions 
their pairwise distance as dij = \ fj — fj|, 
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and defining 
R g is calculated 



(4) 



A specific measure of the difference of a given con- 
formation from the native is provided by the average 
pairwise C a distance root- mean-square deviation D Tms , 
calculated as 



Di = 



1 



2N 



N N 

i=i j=i 



4°) 2 



(5) 



where the lettered superscripts label pairwise distances 
calculated on two different conformations, one of which 
is native. This is a variation on a previous formula 31 , 
slightly modified in its prefactor to be more analogous to 

Rg. 

There are five hydrogen bonds present between the two 
antiparallel /3-strands in the native structure, as well as 
a terminal salt bridge. To provide a sensitive measure of 
the formation of these bonds, we define the deviation of 
each as the difference of the distance between the donor 
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and acceptor atoms and the force field equilibrium dis- 
tance. We further define the fraction of formed native 
H-bonds /h as the average of the count of these bonds, 
including the terminal salt bridge, whose deviations are 
less that 0.3 A. 

For the results involving ff99SB presented below, we 
use a measure of alpha-helical contacts to characterize 
alternate conformations that are markedly different from 
the native structure. A contact is counted as formed 
if the pairwise distance of any carbonyl oxygen to any 
amide nitrogen located four residues closer to the C- 
terminus is less than 3.3 A. This measure is limited, since 
we do not employ an angular criterion to discriminate the 
relative orientation of the donor, proton, and acceptor. 

We employ two simple indicators of the Trp sidechain 
properties. First, the distances between C$\ sidechain 
atoms in each of the cross-strand Trp- Trp residue pairs 
(4,9) and (2,11) indicate how closely-packed the side 
chains are. Second, the relative orientation of the planes 
of the indole rings in each pair is indicated by the copla- 
narity, or the normalized dot product of two vectors that 
lie in each plane, each pointing from C 7 to Cs2 in its 
respective sidechain. 

All potentials of mean force are computed using an ex- 
isting implementation of the multistate Bennett accep- 
tance ratio (MBAR) method^. 



III. RESULTS 

Initial REMD on trpzip2 with ff99SB and a com- 
parison of various AMBER force fields with constant- 
temperature MD demonstrate the sensitivity of the 
molecular mechanics model to changes in and ip dihe- 
dral energy. REMD on trpzip2 with ff99/dih96 shows 
qualitatively accurate folding behavior that depends 
weakly on the nonpolar component of solvation energy. 
REMD on W2W9 with ff99/dih96 gives results that are 
consistent with experiment and reveals factors involved 
in stabilizing the native state. 
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FIG. 2. Thermal variation of order parameters using the 
fF99SB/GB force field, (a) The fraction of formed native H- 
bonds /h in trpzip2 is small, while the number of formed 
a-helical contacts is appreciable, (b) The D Tlns from the na- 
tive structure indicates a markedly non-native structure at all 
temperatures. All ordinates are averages over the final 200 ns 
per replica of simulated time. Note that helical contacts are 
not well- converged, which may be due to a slower timescale 
for equilibration or slow fluctuation in this order parameter. 



potential energy profile shown may be compared to a 
similar result derived from decoy analysis of trpzip2 un- 
der the same force field and the same implicit solvent 
model^i. In that analysis, a different metric of deviation 
from the native conformation (rmsd) was used, and the 
potential energies do not account for thermal effects, as 
noted therein. Nevertheless, a small energetic stability 
of the native state was suggested, whereas the current 
result indicates marginal instability of the native state 
in both potential energy and free energy under canonical 
simulation conditions. 



Folding and unfolding REMD runs of length 700 ns 
were first performed using ff99SB/GB. Figure [5] shows 
the native D lms and formed native backbone hydrogen 
bonds /h for both initial conditions, averaged over the fi- 
nal 200 ns of simulated time. These observables converge 
on values that indicate non-native equilibrium states at 
all temperatures. Average alpha-helical contacts in tr- 
pzip2 are also plotted versus temperature, showing that 
the competing stable state at standard temperature is 
partially alpha-helical. 

Figure [3] shows a potential of mean force and average 
total potential energy versus the -D rms observable. The 
PMF gives evidence for a structural state that is more 
stable than the native and quite dissimilar from it. The 



B. Comparison of AMBER force fields 

We compared several AMBER force fields by com- 
puting the difference in total potential energy between 
the native structure and a competitively stable alter- 
nate structure in short, constant-temperature simula- 
tions. The alternate structure was chosen as the final 
conformation of the replica with the highest occupancy 
at the lowest temperature over the final 200 ns of the 
REMD folding simulation employing ff99SB. Trajectories 
were generated for both the native and alternate con- 
formations with ff99, ff99SB, ff99/dih96, and ff03 force 
fields, and average energy differences were calculated, as 
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FIG. 3. The potential of mean force versus D Ilns at 293 
K shown in (a) indicates that the native state is not glob- 
ally stable under the ff99SB force field in GB solvent. The 
canonical expectation for total potential energy (PE) shown 
in (b) is compared in the text with previous results with the 
same force field. These values were generated from 28 /is of 
aggregated data from both folding and unfolding runs. 



FIG. 4. (a) The fraction of formed native H-bonds /h in tr- 
pzip2 and (b) the D Tms from native versus temperature for the 
ff99/dih96 parameter set with GB and GBSA solvent models. 
Both order parameters clearly indicate one or more confor- 
mational transitions. All ordinates are averages over the final 
200 ns per replica of simulated time. The near convergence 
of the averages of these coordinates demonstrates the degree 
of equilibration reached in this interval. 



shown in Table[I] In addition, a set of runs was performed 
with each force field using the SASA approximation for 
non-polar solvation energy. This comparison of parame- 
ter sets indicates a large stability of the native conformer 
with ff99/dih96. It is also evident that SASA energy does 
not greatly lower the energy of the native conformer for 
any of the tested force fields. 



C. trpzip2 with ff99/dih96 

REMD was performed again using ff99/dih96/GB and 
ff99/dih96/GBSA, given their prediction of larger rela- 
tive native-state energetic stability in the analysis above. 
Graphs of native contacts and D ms versus temperature 
in Figure 0] are indicative of a folding transition favor- 
ing the native conformation. The disparity of midpoint 
temperatures in the thermal transitions for both sol- 
vent models suggests that folding involves more than two 
states. 

Heat capacity versus temperature, shown in Figure [5J 
reinforces this, suggesting two significant transitions oc- 
cur - near 300 K and 375 K for GB, and near 320 K and 
410 K for GBSA. These pairs of transition temperatures 
correspond well with the midpoints of the thermal tran- 
sitions in /h and D rms , respectively. One-dimensional 
(ID) potentials of mean force along the D tms coordinate, 



computed for temperatures near the heat capacity peaks, 
show the onset of transitions as exchanges of free energy 
minima (see Figure [5]) ■ These potentials reveal the pres- 
ence of an unfolded state and at least one intermediate 
state. 

Further evidence for three states can be observed in 
the two-dimensional potentials of mean force shown in 
Figure [7] A PMF versus D rm s and R g shows the native 
state, a significant folding intermediate, and an unfolded 
state that are progressively larger in spatial extent, com- 
mensurate with increasing D mls . Projected onto D ms 
and /h, the PMF shows that the transition from the in- 
termediate to the native state involves the formation of 
most or all the native H-bonds, with no partially-bonded 
intermediates. 

The nature of the folded, intermediate, and unfolded 
states is revealed in greater detail by inspecting the 
thermal change of coordinates involving the native H- 
bond deviations dm and the two cross-strand Trp-Trp 
sidechain Cs distances ^(4,9) and d(2,n) (see Figure [8]). 
Only the GB model is considered, though the proper- 
ties are similar for GBSA. Above 375 K, these coordi- 
nates suggest an unfolded state, where all dm and C51 
distances are large and trend with sequence separation, 
with smaller distances separating pairs closer to the j3- 
turn. Between 300 K and 375 K, both sets of coordi- 
nates are restricted to a narrow range of distances. In 
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TABLE I. The energy difference between the native and the chosen alternate conformer of trpzip2 with variants of the AMBER 
force field and generalized Born implicit solvent. — /SA indicates that the SASA model was included in the simulation. 
AE = i?nat — Salt, so that a negative value indicates a more stable native conformer. For each conformer, averages and errors 
are calculated on 32 simulations of duration 100 ps at temperature 293 K, using the AMBER 9 package. 





ff99 


ff99/SA ff99SB 


ff99SB/SA 


ff99/dih96 


ff99/dih96/SA 


ff96 ff96/SA 


ff03 ff03/SA 


AE (kcal/mol) ^ 


+11.6 


+11.9 -3.5 


-4.3 


-28.1 


-28.6 


-19.2 -19.6 


-1.7 -1.6 


AEsa (kcal/mol) ^ 




-0.22 


-0.32 




-0.35 


-0.33 


-0.33 



a For AE, the average error of the mean is 0.35 kcal/mol. 
b For AEgA, the average error of the mean is 0.02 kcal/mol. 




FIG. 5. The heat capacity of ff99/dih96 vs. temperature 
with and without SASA energy (GB and GBSA, respectively) 
shows two significant transition temperatures. Heat capacity 
is calculated as the first derivative of the interpolating spline 
of the average potential energy versus temperature, using the 
aggregated final 200 ns per replica of folding and unfolding 
runs. 



FIG. 6. Potentials of mean force against fl rms are shown 
for several temperatures, with columns headers labeling the 
solvent model used. Each potential is labeled by the temper- 
ature at which it was evaluated, which is close to a major 
peak in the heat capacity for the respective solvent model, as 
shown in Figure [S] and corresponds to the onset of a shift of 
the global free energy minimum. 



this thermal regime, H-bond distances are uniform and 
more than double their equilibrium bonded lengths on 
average, and the two Csi distances are also nearly equal. 
This indicates that the hairpin is in a non-native col- 
lapsed state, wherein both hydrogen bonds and cross- 
strand Trp-Trp pairs reside close to their lower-energy 
folded states. The uniformity and small values of both 
H-bond deviations and C$\ distances in this temperature 
range are consistent with a native-like, nearly-folded av- 
erage structure for the intermediate. Finally, below 300 
K, H-bond distances settle close to their equilibria, and 
Trp-Trp distances take on values particular to their pack- 
ing geometry. 



D. W2W9 with ff99/dih96 

The W2W9 mutated form of trpzip2 was simulated 
with REMD for 600 ns per replica using fT99/dih96/GB. 
Figure |H] shows heat capacity, /h, and D lms versus 
temperature, with the corresponding results for trpzip2 
shown for comparison. PMFs versus D ms and R g in Fig- 
ure [10] reveals three basins that are, by the D rms measure 
alone, indistinguishable from the unfolded, intermediate, 
and native states of trpzip2. The 2D PMF shows that 
two more compact non-native states join the intermediate 
and unfolded free-energy basins in possessing competing 
stability relative to the native. The native hairpin in 
W2W9 is clearly less stable, with a melting temperature 
near 250 K, though the ID PMF gives an estimate of 
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FIG. 7. Two-dimensional potentials of mean force plotted 
against (a) R g and D rms ; and (b) /h and D rms , indicating the 
abrupt formation of most native H-bond contacts upon the 
transition from the intermediate to the native state. Contours 
are placed at intervals of ksT, and the temperature T = 
298 K corresponds to the first major peak in the heat capacity 
for the GB solvent model. 
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FIG. 8. (a) Pairwise Csi distances (solid lines) and copla- 
narity (dotted lines, secondary ordinate) of cross-strand Trp- 
Trp pairs and (b) deviations of H-bond and salt bridge dis- 
tances from equilibria show more detailed structural features 
of the two major transitions. 



FIG. 9. (a) /h, heat capacity, and (b) Drms vs. temperature 
for W2W9, using ff99/dih96/GB. Averages over folding and 
unfolding runs are shown, and trpzip2 data using the same 
model parameters are shown for comparison. 



31% for the population of structures with D rms < 1.0 at 
275 K. 

The loss of cross-strand Trp pairs uncovers the driving 
factors for stabilizing the intermediate and native states. 
The high-temperature collapse to a native-like interme- 
diate occurs near the same temperature as for trpzip2, 
indicating that dihedral energies drive this collapse. The 
transition to a folded structure occurs much lower in tem- 
perature, and the intermediate state has a protracted 
range of thermal stability. The Trp pairs are thus es- 
sential for stabilizing the native state and increasing the 
cooperativity of the lower-temperature transition. 



IV. DISCUSSION 

The foregoing results present several remarkable fea- 
tures. In implicit solvent, and holding other conditions 
constant, we have observed a substantial sensitivity in 
folding behavior to variation of the <j> and ip dihedral 
potential energies. In particular, while ff99SB dihedral 
energies do not support a stable native state for trpzip2 
in the given simulation conditions, those derived from 
ff96 do so quite well. Through simulations of the W2W9 
mutant, we have found these backbone parameters are 
not sufficient for hairpin stability; electrostatic sidechain 
interactions are also critical. Moreover, our results with 
W2W9 compare well with experiment, suggesting that 
ff99/dih96 is robust for modeling j3- hairpin structures. 

The ff99SB force field with GB implicit solvent results 
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FIG. 10. (a) The ID PMF versus L> rms ; and (b) the 2D 
PMF versus D rmB and R g for W2W9 using ff99/dih96/GB, 
both computed at 275 K from 200 ns per replica from both 
folding and unfolding runs. 



in a stable non-native a-helical state. This provides a 
potential contrast with a recent computational study of 
trpzip2 that used the same force field and TIP3P ex- 
plicit solvent^, which found a folding free energy close 
to -0.5 kcal/mol at 273 K. However, this divergence is un- 
surprising given that previous studies comparing implicit 
and explicit solvent have shown that GB models tend to 
overstabilize a-helical conformations^—. As indicated 
by the potential of mean force versus D lms at 293 K in 
Figure [21 this effect need not be large to destabilize the 
native state in the context of ff99SB. In fact, we observe 
a folding free energy close to +0.5 kcal/mol at 273 K 
(leading to an unstable native state) - a discrepancy of 
close to 1.0 kcal/mol between GB implicit and TIP3P 
explicit solvent. We have not tested the effect of a SASA 
model applied along with ff99SB; however, the results in 
Table Q] show only a slight energetic stabilization of the 
native conformer with SASA energy. 

SASA energy is evidently not a dominant factor in sta- 
bilizing the native state of trpzip2 in the context of the 
ff99/dih96 GBSA model used here. SASA energy shifts 
transition temperatures higher and increases free energy 
changes between these states, but does not significantly 
change their location on the free energy landscape. This 
is a sensible result of the surface area penalty imposed by 
the SASA model, which is expected to generically stabi- 
lize compact structures. Unfortunately, until tested with 
an accurate force field, the underlying importance of non- 
polar effects as modeled with SASA energy will remain 
obscure. 



Our simulations of the W2W9 variant of trpzip2 are 
in qualitative agreement with experiment. W2W9 is not 
expected to have a significant folded population at any 
temperature down to 275 K, which is the lower limit of 
the experimentally investigated range of temperatures*'. 
Indeed, in our REMD simulation of W2W9, we observe a 
diminished near-native population (-D rms < 1-0) of about 
31% at 275 K using ff99/dih96/GB. 

Our results on W2W9 suggest that the electrostatic in- 
teraction of Trp pairs greatly stabilizes the native hairpin 
structure, since the principal effect of the mutation from 
trpzip2 to W2W9 is the removal of such pairs. The im- 
portance of a purely electrostatic effect is clear, since the 
SASA model was not employed in this simulation. This 
finding adds greatly to the putative importance of this in- 
teraction, which has been implicated in the edge-to-face 
packing arrangement of the indole sidechain pairs^. 

It is worth considering the influence of the implicit sol- 
vent model on the stability of the W2W9 hairpin. First, 
we have not included a model of non-polar effects, but 
SASA energy would likely further stabilize the hairpin, as 
we showed in the case of trpzip2. Second, as mentioned 
earlier, GB models tend to shift secondary structure pref- 
erence towards a-helical conformations when compared 
with explicit solvent models. This tendency would be 
expected to further destabilize the observed /3-like native 
conformation. The inaccuracy of the solvent model is 
thus probably underpredicting the stability and popula- 
tion of the folded hairpin. 

Force fields employing the ff96 dihedral parameters 
have produced at least qualitatively accurate results for 
trpzip2£ i n ' 12 i 37 , and our results fall into agreement with 
these. Furthermore, our study of W2W9 provides evi- 
dence for the robustness of these parameters in model- 
ing /3-hairpin structures. We have presented a simple 
but critical test of force field accuracy in reproducing /3- 
hairpin folding based on experimental findings. This is 
only one of many possible comparisons with experiment 
in small peptide systems - variation in primary sequence, 
pH conditions, and temperature, for example - that 
can be useful for evaluating theoretical-computational 
approaches. Moreover, the availability of increasingly 
powerful computational resources and techniques enables 
such evaluations based on equilibrium properties. 
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Appendix: Algorithm Validation 

1. Comparison of Absolute Energies 

In order to validate the LN algorithm in the form de- 
scribed in this work, we compared it with the Langevin 
dynamics algorithm in the AMBER 9 package^ as a 
standard. For identical simulation conditions (initial 
structure, force field, temperature, duration, etc.), we 
might expect the time-averaged energies obtained by us- 
ing each MD algorithm to be the same (within error) over 
many independent trials. 

To test this, we performed 32 independent 100-ps sim- 
ulations for each algorithm, starting from model 1 of the 
experimental trpzip2 structure^. The temperature was 
held fixed at 293 K, the ff99SB force field was used, and 
all other simulation parameters were set to the values 
mentioned in the Methods section. For each algorithm, 
we computed the average of the bonded, nonbonded, and 
solvation energies over the duration of each simulation 
and then over all 32 simulations. In Figure QJi-c, each 
of these three average energies for the LN simulations is 
expressed as a percent deviation from the values for the 
AMBER simulations. This test was repeated for several 
choices of the timesteps Ai m and At in the LN algorithm. 

There is a systematic energy shift between AMBER 
and LN, but it within two percent of the AMBER value 
for all energy components, including the total potential 
energy. The mean energy in LN also varies with outer 
timestep, but only within half a percent of the base value 
at At = 2 fs. The average error in LN temperature when 
At m = 4 fs is +0.58% ± 0.01%. 



2. Comparison of Energy Differences 

Systematic deviations of internal energy and temper- 
ature in LN simulations with larger medium timesteps 
were duly noted in the original work (Figure 6 in Ref. 1 1 81 ) . 
Such deviations are worth serious consideration, since 
stability in biomolecules can be based on energy differ- 
ences as small as a few kcal/mol. Accordingly, what is 
most crucial in the evaluation of LN is how well this al- 
gorithm preserves differences in internal energy between 
different structures, rather than how well it preserves in- 
dividual energy values for a given structure. 

To see how well LN preserves energy differences, we 
obtained a diverse set of trpzip2 structures, simulated 
them with both LN and AMBER according to the pro- 
tocol described in the previous section, and calculated 
the differences in average total energies between all the 
unique pairs in this set of structures. Ideally, when LN 
was used, these differences would be nearly the same as 
when AMBER was used. 

To obtain the diverse set of trpzip2 structures, a pool of 
160 snapshots was obtained at all temperatures at several 
well-separated times during the long REMD simulations 
performed in this study. Clusters were formed from this 



pool using the D rms metric according to an algorithm 
described by Daura£&, with a cutoff of 1.5 A. The 12 
cluster centers that resulted from this analysis formed 
the diverse structure set. 

Figure Ilia shows the differences in average internal 
energy for each unique pair in the resulting set of struc- 
tures. These differences for the two MD algorithms are 
plotted against one another, showing a close correlation. 
Figure [Tib shows the deviation between LN and AM- 
BER for the energy difference of each pair of structures. 
When averaged over all pairs, this deviation is very small, 
at 0.027 ± 0.043 kcal/mol. Evidently, despite the small 
systematic errors in absolute energies and temperature 
seen in the previous section, the LN algorithm preserves 
energy differences well. 
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FIG. 11. A comparison of AMBER and LN using total internal energy differences between unique pairs in a set of twelve 
diverse structures of trpzip2. The energy parameters and simulation conditions were identical to those used for the data in 
Figure [T] (a) Energy differences AE are highly correlated, and (b) the quantity AAE = AEamber — A_Eln is small and 
distributed about zero. Moreover, the deviation of AAE about zero decreases with increased sampling (data not shown). 
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